Transcribing lectures and seminars
نویسندگان
چکیده
This paper describes recent research carried out in the context of the FP6 Integrated Project CHIL in developing a system to automatically transcribe lectures and seminars. We made use of widely available corpora to train both the acoustic and language models, since only a small amount of CHIL data were available for system development. For acoustic model training made use of the transcribed portion of the TED corpus of Eurospeech recordings, as well as the ICSI, ISL, and NIST meeting corpora. For language model training, text materials were extracted from a variety of on-line conference proceedings. Word error rates of about 25% are obtained on test data extracted 12 seminars.
منابع مشابه
Usable speech recognition
A growing number of lecture webcasts are archived after being delivered live. In the absence of transcripts, users are faced with increased difficulty in performing tasks easily achieved with text documents (retrieval, browsing, skimming). Unfortunately, speech recognition systems do not perform satisfactorily when transcribing lectures. In this paper, we present an overview of the ePresence le...
متن کاملThe Effect of Transcribing on Beginning Learners’ Phonemic Perception
A large number of studies dealing with phonology have focused their attention on phonological production at the expense of phonological perception which provides the foundation stone for phonological production. This study focuses on phonological perception at phonemic level. The purpose of the study is helping beginning learners improve their perception of the English phonemes which are confus...
متن کاملNon-rigid image registration evaluation using common evaluation databases
faculty, staff and students in the Electrical and Computer Engineering department for many informative seminars and lectures.
متن کاملLanguage modeling and transcription of the TED corpus lectures
Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work, we present our first results on the automatic transcription of lectures from the TED corpus, recently released by ELRA and LDC. In particular, we concentrated our effort on language modeling. Baseline acoustic and language models were developed using respectively 8 hours of TED transcripts and ...
متن کاملStudents' perceived value of physiology course activities in a Sudanese medical faculty.
The physiology course in our department consists of lectures, laboratory sessions, and tutorials, all of which are teacher centered, as well as student-led seminars. The overall aim of this project was to investigate student perceptions of the value of varying academic activities on their learning of physiology. A faculty-based descriptive study was conducted at the Faculty of Medicine and Heal...
متن کامل